This PR adds a new Wazuh integration for Wazuh decoder rule generation tool by Hasitha9796 · Pull Request #79 · wazuh/integrations

Hasitha9796 · 2026-05-01T06:33:57Z

Summary

This PR adds a new integration named wazuh_decoder_rule_tool — a FastAPI-based tool for analyzing logs, checking existing Wazuh decoder/rule matches through wazuh-logtest, and generating custom decoder and rule XML.

New Features

AI-Powered Generation (Hybrid Approach)

Hybrid architecture: programmatically generates correct Wazuh decoder XML, then uses an LLM to review and improve osregex patterns
Multiple AI providers: Ollama (local, no rate limits), DashScope (Qwen 3.6 Plus), and OpenRouter
wazuh-logtest integration: every AI generation first checks wazuh-logtest to determine:
- Whether a custom decoder is needed at all
- The correct parent strategy (<program_name> when available, <prematch> otherwise)
- Which fields are already decoded by built-in decoders (skipped automatically)
Priority fallback: Ollama > DashScope > OpenRouter

Enhanced ML Decoder Similarity

Ensemble model combining TF-IDF (exact token matching) + SBERT (semantic similarity)
Configurable weighting (default: 40% TF-IDF, 60% SBERT)
Enhanced tokenization preserving regex patterns
Backward compatible with existing TF-IDF fallback

Improved Decoder Generation

Split decoders: one child decoder per field for better accuracy
Robust prefix generalization (timestamps, IPs, MAC addresses, PIDs)
CEF (Common Event Format) log support with field mapping
Per-field validation explaining which fields will/won't be decoded
Multiple log type handlers: syslog, JSON, key=value, bracketed, Java dash, Android, Palo Alto CSV

Robustness & Reliability

Timeouts on all git subprocess calls (clone, pull, sparse-checkout) to prevent startup hangs
Proper Wazuh OS_Regex validation (no PCRE patterns, correct \. vs . semantics)
Non-blocking SSH with configurable timeouts

Included

FastAPI backend with streaming AI responses
Single-page HTML/JS UI with decoder analysis, rule generation, AI generation, and testing
Log analysis using heuristics with regex generation engine for Wazuh OS_Regex compatibility
wazuh-logtest validation (local or remote via SSH)
ML-based decoder similarity (TF-IDF + optional SBERT ensemble)
Rule ML model trained from wazuh-ruleset
Per-field feedback collection for continuous improvement
README with comprehensive setup instructions including AI provider configuration

Testing

The app can be tested locally:

Set up the virtual environment:

cd integrations/wazuh_decoder_rule_tool
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Generate SSL certificates:

mkdir -p certs
openssl req -x509 -newkey rsa:4096 -keyout certs/localhost.key -out certs/localhost.crt -days 365 -nodes -subj "/CN=localhost"

Start the application (with AI):

export OLLAMA_BASE_URL=http://localhost:11434/v1
export OLLAMA_MODEL=llama3.2:3b
uvicorn app.main:app --host 0.0.0.0 --port 8443 --ssl-certfile certs/localhost.crt --ssl-keyfile certs/localhost.key

Access the application via https://localhost:8443.

Connecting to Wazuh VM for wazuh-logtest

export WAZUH_SSH_HOST=192.168.56.10
export WAZUH_SSH_PORT=22
export WAZUH_SSH_USER=vagrant
export WAZUH_SSH_PASSWORD=vagrant

Example Scenario

Paste a log like: May 19 12:34:56 custom-server myapp[1234]: User 'admin' failed to authenticate from IP 192.168.1.100 due to invalid_password
Click Analyze to detect log type and extract fields
Select fields to extract (e.g., user, srcip)
Click Generate for programmatic decoder+rule generation
Click AI Generate for AI-assisted pattern improvement

…s and auto-enable split mode for CEF logs

… formats for more reliable extraction

…ue logs instead of truncating prefixes

…source pattern learning

…ders from all logs

…tterns and full preceding words instead of truncating prefixes

…d generalize them to \d+ to prevent brittle anchors

…on for decoders

…ecoder prefixes

Copilot

Pull request overview

This PR introduces a new wazuh_decoder_rule_tool integration: a FastAPI-based UI/API for analyzing pasted logs, optionally validating them via wazuh-logtest, and generating Wazuh decoder/rule XML. It also adds an “enhanced” ML decoder-similarity approach (TF‑IDF + SBERT) plus scripts/datasets to train a custom similarity model from Wazuh ruleset test data.

Changes:

Add the FastAPI app’s HTML/JS/CSS frontend and supporting backend utilities for decoder/rule generation workflows.
Add ML enhancements: ensemble similarity model wrapper, dataset builder + training script, and accompanying tests/docs.
Add local datasets and TLS artifacts for local HTTPS testing (currently including private keys).

Reviewed changes

Copilot reviewed 21 out of 26 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
integrations/wazuh_decoder_rule_tool/tests/test_ml_enhanced.py	Adds unit tests for enhanced ML similarity components.
integrations/wazuh_decoder_rule_tool/tests/test_integration.py	Adds a basic integration test for enhanced ML model loading.
integrations/wazuh_decoder_rule_tool/scripts/train_similarity.py	Adds SBERT contrastive training script for decoder similarity.
integrations/wazuh_decoder_rule_tool/scripts/build_dataset.py	Adds script to build training/validation datasets from Wazuh rules-testing suites + feedback.
integrations/wazuh_decoder_rule_tool/requirements.txt	Adds Python dependencies for running the tool (FastAPI/Uvicorn/ML libs).
integrations/wazuh_decoder_rule_tool/README.md	Documents local HTTPS run instructions, remote VM mode, and ML training workflow.
integrations/wazuh_decoder_rule_tool/ML_ENHANCEMENT_SUMMARY.md	Documents ML feature-engineering + ensemble approach and future tuning ideas.
integrations/wazuh_decoder_rule_tool/key.pem	Adds a private key file (should not be committed).
integrations/wazuh_decoder_rule_tool/generated/decoders/local_myapp_decoder_20260307094900.xml	Adds generated decoder XML output artifact.
integrations/wazuh_decoder_rule_tool/generated/decoders/local_myapp_decoder_20260307094544.xml	Adds generated decoder XML output artifact (duplicate-style).
integrations/wazuh_decoder_rule_tool/data/datasets/val.jsonl	Adds validation dataset records for ML training.
integrations/wazuh_decoder_rule_tool/data/datasets/feedback.jsonl	Adds feedback dataset examples used for training/tuning.
integrations/wazuh_decoder_rule_tool/data/datasets/feedback_rejections.jsonl	Adds rejected feedback examples for analysis/training workflows.
integrations/wazuh_decoder_rule_tool/certs/localhost.key	Adds a private TLS key for local HTTPS (should not be committed).
integrations/wazuh_decoder_rule_tool/certs/localhost.crt	Adds a self-signed TLS certificate for local HTTPS.
integrations/wazuh_decoder_rule_tool/cert.pem	Adds a certificate artifact for local HTTPS usage.
integrations/wazuh_decoder_rule_tool/app/wazuh_logtest.py	Adds a helper to run `wazuh-logtest` via SSH (currently hardcoded/inconsistent).
integrations/wazuh_decoder_rule_tool/app/templates/index.html	Adds the single-page HTML UI for the tool.
integrations/wazuh_decoder_rule_tool/app/static/styles.css	Adds styling for the UI.
integrations/wazuh_decoder_rule_tool/app/static/app.js	Adds UI logic for navigation, generate/test flows, ML status, AI generation, feedback, history.
integrations/wazuh_decoder_rule_tool/app/decoder_ml.py	Adds baseline TF‑IDF similarity models + parsing utilities for decoders/rules.
integrations/wazuh_decoder_rule_tool/app/decoder_ml_enhanced.py	Adds enhanced feature engineering + ensemble similarity model + compatibility wrapper.
integrations/wazuh_decoder_rule_tool/.gitignore	Adds ignores for venv/cache/model/repo directories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+function toggleConditionsRow() {
+  const req = document.getElementById('ruleRequirement').value.trim();
+  document.getElementById('ruleFieldConditionsRow').style.display = req ? 'flex' : 'none';
+  document.getElementById('ruleMatchConditionsRow').style.display = req ? 'flex' : 'none';


+    try:
+        # This might fail if no Wazuh repo is available, but that's OK for this test
+        model = ensure_ml_model_enhanced(force_refresh=False, use_ensemble=True)
+        # If we get here without exception, the function works
+        assert model is not None or model is None  # Either is fine
+        print("✓ ensure_ml_model_enhanced executed successfully")
+        return True
+    except Exception as e:
+        print(f"✗ ensure_ml_model_enhanced failed: {e}")
+        return False
+
+
+if __name__ == "__main__":
+    success = test_ensure_ml_model_enhanced()
+    if success:
+        print("Integration test passed!")
+    else:
+        print("Integration test failed!")


+            parts.extend([self.prematch] * int(prematch_weight))
+        if self.regex:
+            # Extract meaningful tokens from regex
+            regex_tokens = re.findall(r'\[\\w\+\\]|\\\\d\+|\\\\S\+|\\\\w\+', self.regex)


+
+        parts = []
+        if self.name:
+            parts.extend([self.name] * int(name_weight))
+        if self.program_name:
+            parts.extend([self.program_name] * int(program_weight))
+        if self.prematch:
+            parts.extend([self.prematch] * int(prematch_weight))
+        if self.regex:
+            # Extract meaningful tokens from regex
+            regex_tokens = re.findall(r'\[\\w\+\\]|\\\\d\+|\\\\S\+|\\\\w\+', self.regex)
+            parts.extend(regex_tokens * int(regex_weight))
+        if self.order:
+            parts.extend(self.order * int(order_weight))


+-----BEGIN PRIVATE KEY-----
+MIIJQgIBADANBgkqhkiG9w0BAQEFAASCCSwwggkoAgEAAoICAQDeCJuheTkfwUSK
+shHW/6XR28sohDtaA+BgE5VQhA/dO0A0OD4Y+FHFvwqDZg4j74mZ1s4BBxdercSO
+l1NXmfTJvH0WhY09vSyS3g4N/T1unrtTFUTrC3Dc5ovLAxAUe2AHLGhQcXGWRbTq
+pEL1KEoYG89DSisTjSBOcoM3dE8fnU2Gc7YCvLUh8IpIaYLr0GOiQumAGhxIyWGq


+# Cache directories and ML models
+data/models/
+data/wazuh_repo/
+data/wazuh_ruleset_repo/
+


@@ -0,0 +1,3 @@
+{"log":"03-17 16:13:38.811  1702  2395 D WindowManager: printFreezingDisplayLogsopening app wtoken = AppWindowToken{9f4ef63 token=Token{a64f992 ActivityRecord{de9231d u0 com.tencent.qt.qtl/.activity.info.NewsDetailXmlActivity t761}}}, allDrawn= false, startingDisplayed =  false, startingMoved =  false, isRelaunching =  false","decoder":{"name":"myapp-event","parent":"myapp","prematch":"WindowManager:","regex":"(\\d+-\\d+ \\d+:\\d+:\\d+.\\d+)  \\d+  \\d+ \\S WindowManager: \\S+ \\S+ wtoken = (\\.+) token=(\\.+), allDrawn= (\\S+)","order":["logtime","wtoken","token","allDrawn"],"source_file":"feedback/windowmanager.json"}}
+{"log":"20171223-22:15:33:144|Step_SPUtils|30002312| getTodayTotalDetailSteps = 1514038440000##7013##548365##8661##12836##27176966","decoder":{"name":"myapp-event","parent":"myapp","prematch":"Step_SPUtils","regex":"(\\.+)\\|Step_SPUtils\\|30002312\\| getTodayTotalDetailSteps = (\\.+)","order":["logtime","getTodayTotalDetailSteps"],"source_file":"feedback/pipemetric.json"}}
+{"timestamp": "2026-05-16T08:56:11.647689Z", "approved": true, "log": "May 16 14:22:31 plc-gateway01 scada-engine[2241]: ALERT Modbus unauthorized write request detected from 10.10.50.24 function_code=0x10 register=40123", "extract_fields": ["srcip", "funtion_code"], "notes": "", "decoder": {"name": "myapp-event", "parent": "myapp", "prematch": "scada-engine", "regex": "ALERT\\s+Modbus\\s+unauthorized\\s+write\\s+request\\s+detected\\s+from\\s+(\\d+.\\d+.\\d+.\\d+)\\s+function_code=(\\d+x\\d+)\\s+register=\\d+", "order": ["srcip", "function_code"], "source_file": "feedback/myapp.json"}, "target_text": "myapp-event myapp scada-engine alert\\s+modbus\\s+unauthorized\\s+write\\s+request\\s+detected\\s+from\\s+(\\d+.\\d+.\\d+.\\d+)\\s+function_code=(\\d+x\\d+)\\s+register=\\d+ srcip function_code feedback/myapp.json"}


+{"timestamp": "2026-04-29T05:52:13.354712Z", "approved": false, "app_name": "myapp", "log": "[2026-04-29T04:29:06,056][INFO ][o.o.s.s.c.FlintStreamingJobHouseKeeperTask] [node-1] Starting housekeeping task for auto refresh streaming jobs.", "extract_fields": ["logtime", "loglevel", "message"], "notes": "[(\\d+-\\d+-\\S+:\\d+:\\d+,\\d+)][(\\S+)\\s][\\.+] [\\S+] (\\.+)"}
+{"timestamp": "2026-04-29T08:50:41.323760Z", "approved": false, "app_name": "myapp", "log": "[2026-04-29T04:29:06,056][INFO ][o.o.s.s.c.FlintStreamingJobHouseKeeperTask] [node-1] Starting housekeeping task for auto refresh streaming jobs.", "extract_fields": [], "notes": "It should be corrected like this"}
+{"timestamp": "2026-04-29T08:50:41.368350Z", "approved": false, "app_name": "myapp", "log": "[2026-04-29T04:29:06,056][INFO ][o.o.s.s.c.FlintStreamingJobHouseKeeperTask] [node-1] Starting housekeeping task for auto refresh streaming jobs.", "extract_fields": [], "notes": "It should be corrected like this"}
+{"timestamp": "2026-05-16T08:56:23.312599Z", "approved": false, "app_name": "myapp", "log": "May 16 14:22:31 plc-gateway01 scada-engine[2241]: ALERT Modbus unauthorized write request detected from 10.10.50.24 function_code=0x10 register=40123", "extract_fields": ["srcip", "funtion_code"], "notes": ""}


+For this workspace, the app now defaults to:
+
+```bash
+WAZUH_SSH_HOST=192.168.56.10
+WAZUH_SSH_PORT=22
+WAZUH_SSH_USER=vagrant
+WAZUH_SSH_PASSWORD=vagrant
+```


+WAZUH_HOST = "127.0.0.1"
+WAZUH_PORT = "2222"
+WAZUH_USER = "vagrant"
+
+# read from environment variable
+WAZUH_LOGTEST = os.getenv("WAZUH_LOGTEST_PATH", "/var/ossec/bin/wazuh-logtest")
+
+
+def run_logtest(log_line):
+    cmd = [
+        "ssh",
+        "-p", WAZUH_PORT,
+        f"{WAZUH_USER}@{WAZUH_HOST}",
+        f"sudo {WAZUH_LOGTEST}"
+    ]


- Hybrid AI generation: programmatic base XML (guaranteed correct) + AI review for regex improvement - wazuh-logtest always checked before AI generation to determine parent strategy - Parent decoder uses <program_name> when log has a decoded program name - Fields already decoded by built-in decoders are skipped automatically - AI prompt refocused on reviewing/improving regex patterns instead of writing XML from scratch - Git subprocess calls now have timeouts to prevent startup hangs - Updated README with AI provider setup and hybrid approach documentation

…ation - Removed Decoder Generator and Rule Generator sections from HTML - Moved input fields (appName, logsInput, extractFields, etc.) into AI view - Removed 'Generate Decoder' and 'Generate Rule' sidebar nav items - Made 'AI Generate' the default active view - Cleaned up app.js: removed unused functions (showAnalysis, showXml, syncFeedback, readRulePayload, rule conditions UI, old button handlers) - Updated history loading and test function to work without decoder view

- Added POST /api/install endpoint to write decoder/rule XML to Wazuh's custom decoders/rules directories (SSH or local) - Added POST /api/uninstall endpoint to remove installed files - Added POST /api/logtest/raw endpoint for running wazuh-logtest with arbitrary log samples and returning raw output + parsed fields - Redesigned Test view with three cards: Installed Decoder (install/ uninstall), Test Logs (editable sample input), and wazuh-logtest Output (raw stdout + parsed fields table) - Added state management storing installed file paths in localStorage - AI-generated XML is now persisted in JS so it can be installed from the Test view without re-running AI generation

…ailure - Add generation_mode (auto/decoder_only/rule_only/both) to AI request - Add validate_with_logtest flag and /api/ai/generate-validated endpoint - Add _collect_ai_response, _extract_xml_from_ai_response helpers - Add _validate_ai_decoder_with_logtest for auto-install+test validation - Refactor _build_ai_prompt: shorter config block, concise ML/logtest context - Add system prompt for Ollama (system+user roles), fix URL path - Lower default temperature to 0.05 for more deterministic output - Default model changed to wazuh-decoder - UI: generation mode dropdown, validate checkbox, Generate & Validate button - UI: show validation badge & details in AI output section - UI: hide rule section when generation_mode=decoder_only

…ndpoint and automate rule group/static field sanitization

…coring, and sigmoid calibration - Add log-type detection (_detect_log_type) with type-based boosting to bias results toward relevant decoder families (JSON, Windows, syslog, etc.) - Add regex token overlap scoring (_regex_overlap_score) to boost patterns whose OS_Regex tokens match query log literals - Add sigmoid confidence calibration for well-calibrated probabilities in [0,1] - Tune ensemble weights: TF-IDF 0.3, SBERT 0.7 (semantic model is stronger for unseen formats) - Raise minimum confidence gate to 0.15 to avoid low-confidence noise - Add fine-tuned SBERT checkpoint loading with graceful fallback - Enhance tokenizer to preserve more OS_Regex character classes

… Modelfile - Lower temperature (0.05→0.02) and top_p (0.85→0.80) for more deterministic output - Increase repeat_penalty (1.15→1.20) and lower top_k (20→15) to reduce repetition - Add self-validation checklist to catch common errors before output - Add JSON log decoder and DHCP/MAC address examples - Fix sshd example to use same decoder name for multiple children - Add instruction: 'No text before or after' the XML block

…lization - Default OLLAMA_BASE_URL to http://localhost:11434/v1 so it works without env vars - Normalize /v1 suffix to prevent double-/v1 404 errors in URL construction - Add 60s timeout to streaming client with retry on ReadTimeout (up to 3 attempts) - Add decoder rule: multiple child decoders must use exact same decoder name - Fix IP regex guidance: do not escape dots in \d+.\d+.\d+.\d+ - Update top_k to 15 and repeat_penalty to 1.20 to match Modelfile tuning - Improve error messages for network/server issues

…to dataset builder - Add load_rejection_records(): convert rejection notes with regex corrections into positive training pairs - Add augment_with_dropout(): create robustness variants by randomly masking log tokens (15% prob) - Rejection corrections teach SBERT to distinguish correct from broken regex patterns - Dropout augmentation teaches model that partial log lines still map to same decoder - Add structured logging of record counts throughout pipeline

…nting to SBERT training - 5 epochs with best-checkpoint saving (by validation AUC) - Larger batch size (64 configurable) for better in-batch negatives with MultipleNegativesRankingLoss - Hard-negative augmentation: pair logs with categorically distinct decoders (30% ratio) - Token dropout data augmentation for robustness on partial input - Early stopping with patience=2 epochs - Add binary evaluator with both positive and negative pairs for AUC measurement - Configurable training device (default CPU to avoid MPS OOM with Ollama) - Copy best checkpoint to 'final' directory for easy model loading

The sidebar defaulted to AI Generate as active, but the corresponding #view-ai div was missing the 'active' class, so CSS display:none kept the entire AI generation page blank on initial load.

…egex instruction The AI model consistently escapes dots (\.) in regex patterns because it is trained on PCRE where this is correct. Wazuh OS_Regex treats '.' as a literal character, so \. is wrong syntax. Fix: - Add _sanitize_decoder_xml_osregex() that strips \. → . in generated XML - Apply it in _extract_xml_from_ai_response and the final return - Strengthen the Modelfile and prompt instruction with WRONG/RIGHT examples to make the rule impossible to miss

- Fix sanitization regex: r'\.' was matching any char after backslash (breaking \d, \w, etc.). Use r'\\.' to match only backslash + literal dot. - Add Example 7 to Modelfile showing correct TrafficLog IP extraction with unescaped dots in OS_Regex - Strengthen prompt WRONG/RIGHT examples for IP regex

The streaming /api/ai/generate endpoint returns raw AI text without server-side processing, so escaped dots (\.) pass through to the browser. Add sanitizeOsRegex() in app.js that strips \. → . client-side after XML extraction, covering both the streaming and validated endpoints.

Add a hand-curated example with unescaped dots for IP regex (\d+.\d+.\d+.\d+) so the fine-tuned model natively learns correct OS_Regex IP syntax instead of relying on post-processing.

…gex instructions - Add _stream_ai_sanitized to post-process AI output and fix \d+\.\d+ → \d+.\d+ (common AI mistake: escaping dots before \d for IPs in Wazuh OS_Regex) - Enhance _sanitize_decoder_xml_osregex to target IP patterns specifically, only removing \. between \d quantifiers, not valid \.+ any-char quantifiers - Update Modelfile with clearer IP regex instructions and new example conversations - Update Modelfile.finetune with more training examples (iptables, squid, UFW, TrafficLog, CEF Palo Alto, nginx, SSH, netfilter, KV log) - Fix kv-log-fields decoder order to match extract_fields

…ex sanitizer - Remove programmatic XML from ai_generate and ai_generate_validated — AI now generates from scratch using only analysis context - Add _fix_osregex_bare_dot_quantifier: converts common AI mistakes (.+) → (\S+), .+ → \.+, .* → \.+ inside regex/prematch tags - Update _OLLAMA_SYSTEM_PROMPT with explicit anti-pattern examples showing CORRECT vs WRONG OS_Regex patterns - Strengthen _build_ai_prompt decoder_rules with OS_Regex constraints and anti-echo instructions - Update Modelfile and Modelfile.finetune with anti-pattern section

…d bare-dot sanitizer - Remove raw streaming output (#aiOut) from UI — only show final extracted XML - Add Reference Field-to-Pattern Mapping in _build_ai_prompt: programmatic regex patterns as text guidance (not XML blocks AI can echo) - Add _infer_osregex_type helper to suggest correct OS_Regex pattern per field - Add _build_fallback_decoder: silently builds programmatic decoder when AI produces no valid XML (uses user inputs like field_hints) - Add _fix_osregex_bare_dot_quantifier: sanitizes (.+) → (\S+), .+ → \.+, .* → \.+ inside regex/prematch tags - Update ai_generate_validated to fall back when all retries fail - Remove ai-stream-block CSS (unused)

…d-aid sanitization AI now handles structure (decoder names, hierarchy, order tags) but regex patterns come from the proven programmatic engine. _inject_programmatic_regex matches each <regex> to the next <order> and replaces the content with the correct regex from analysis regex_order_pairs. - Remove unreliable bare-dot/IP sanitizers for regex content - _extract_xml_from_ai_response accepts regex_order_pairs param - ai_generate and ai_generate_validated pass analysis data through

- Remove _inject_programmatic_regex, _build_fallback_decoder - Remove _FIELD_PATTERN_MAP, _infer_osregex_type - Remove Reference Field-to-Pattern Mapping from _build_ai_prompt - _sanitize_decoder_xml_osregex reverts to band-aid fixes only - _extract_xml_from_ai_response no longer takes regex_order_pairs - ai_generate and ai_generate_validated have zero programmatic fallback AI generates everything (structure + regex) independently.

…ript AI now generates everything (structure + regex) independently. Only band-aid sanitization remains: (.+) → (\S+), .+ → \.+, \d+\.\d+ → \d+.\d+. Add scripts/train_osregex.py — extracts 26 training pairs from Modelfile.finetune into JSONL format and provides training commands for Ollama 0.5+, Unsloth, llama.cpp, and Axolotl.

…regex correction - scripts/generate_finetuning_data.py: downloads all 104 decoder + 133 rule XMLs from wazuh-ruleset, generates 806 training pairs (725 train / 81 val) in JSONL format - app/main.py: _inject_correct_regex silently replaces AI <regex> content with analysis-derived patterns; _INTERNAL_FIELD_REGEX maps field names to correct OS_Regex - scripts/train_osregex.py: points to new 806-example dataset - .gitignore: add .cache_decoders/

…refix regex generation - sanitizeOsRegex disabled: backend _inject_correct_regex already fixes patterns - build_split_regexes_from_fields: better first-field prefix handling (no \.+ prefix for start-of-log fields); use (\.+) for multi-word/multi-token field values

…ead of preceding words

…on; expand CEF field aliases - app.js: checkExistingDecoder() calls /api/analyze first and shows confirm() dialog if a builtin decoder already matches - main.py: add 'source', 'destination', 'port' aliases for CEF field mapping

…gex token patterns - AI prompt: provide correct prematch when no program_name is pre-decoded - AI prompt: add both <program_name> and <prematch> strategy examples - decoder_ml_enhanced.py: fix over-escaped regex tokens in enhanced tokenizer - wazuh_logtest.py: use env vars with fallback defaults instead of hardcoded values - .gitignore: exclude certs/, *.pem, *.key, *.crt - README.md: make SSH config docs generic

Hasitha9796 and others added 30 commits August 20, 2025 17:33

README.md

3451028

Create custom-ticketing

2386f02

Create custom-ticketing.py

cc6f037

Create README.md

78f12de

Create custom-email.py

4923e48

Create Vagrantfile

0dd69ae

Create inventory.ini

ea00869

Create README.md

767d3d4

Update README.md

2612226

Update README.md

ec4bd0c

Update README.md

c3f62c6

Delete Custom email template directory

e430af0

Merge branch 'wazuh:main' into main

8bb083b

Added integration: Microsoft Teams Using Ticketing as a Service

361f955

Added ansible + vagrant deployement steps.

9f49dee

Delete Wazuh + Microsoft Teams Ticketing as a service directory

fbe1354

Delete wazuh-deployment-ansible-vagrant directory

77a855f

updated the folder names

21bda76

Merge branch 'wazuh:main' into main

f2ff40c

Merge branch 'wazuh:main' into main

78a205f

feat: add wazuh decoder rule tool integration

e09e897

fix(decoder): ensure CEF split decoders use user requested field name…

a3ee16b

…s and auto-enable split mode for CEF logs

feat(decoder): enable split decoder generation by default for all log…

c0b9192

… formats for more reliable extraction

fix(decoder): dynamically extract full field keys for non-CEF key=val…

e50b588

…ue logs instead of truncating prefixes

style(ui): clarify log input section instructions to indicate single-…

d1dbe2d

…source pattern learning

fix(decoder): support multiple program names and aggregate child deco…

cb7dc0d

…ders from all logs

fix(decoder): improve fallback regex generation to use IP specific pa…

dc77e20

…tterns and full preceding words instead of truncating prefixes

fix(decoder): detect numeric dynamic fields (e.g. IPs) in prefixes an…

973f686

…d generalize them to \d+ to prevent brittle anchors

Update README with HTTPS setup instructions and refine regex generati…

bff6cb7

…on for decoders

Fix regex to correctly include spaces before punctuation in dynamic d…

f74c97a

…ecoder prefixes

Copilot started reviewing on behalf of nicolascurioni May 25, 2026 13:13 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Hasitha9796 added 27 commits May 26, 2026 16:58

Improve AI generation pipeline, tune Ollama prompt, fix completions e…

0b42a17

…ndpoint and automate rule group/static field sanitization

fix(ui): add missing active class to AI view so it shows on page load

07fa10b

The sidebar defaulted to AI Generate as active, but the corresponding #view-ai div was missing the 'active' class, so CSS display:none kept the entire AI generation page blank on initial load.

chore: bump static file version for cache bust

3504e72

feat(train): add TrafficLog IP extraction example to finetuning data

6b96bff

Add a hand-curated example with unescaped dots for IP regex (\d+.\d+.\d+.\d+) so the fine-tuned model natively learns correct OS_Regex IP syntax instead of relying on post-processing.

fix: simplify prefix regex generation — use exact key= separator inst…

bd7a758

…ead of preceding words

MiguelCasaresRobles merged commit 6e6799b into wazuh:main Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This PR adds a new Wazuh integration for Wazuh decoder rule generation tool#79

This PR adds a new Wazuh integration for Wazuh decoder rule generation tool#79
MiguelCasaresRobles merged 82 commits into
wazuh:mainfrom
Hasitha9796:main

Hasitha9796 commented May 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Hasitha9796 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Features

AI-Powered Generation (Hybrid Approach)

Enhanced ML Decoder Similarity

Improved Decoder Generation

Robustness & Reliability

Included

Testing

Connecting to Wazuh VM for wazuh-logtest

Example Scenario

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hasitha9796 commented May 1, 2026 •

edited

Loading